{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "<h1 align=\"center\">Snapshots: how to use them and why they are useful</h1>\n",
    "<center><img src=\"https://raw.githubusercontent.com/man-group/ArcticDB/master/static/ArcticDBCropped.png\" alt=\"ArcticDB Logo\" width=\"400\">\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### An Introduction to Snapshots\n",
    "In order to understand snapshots we first need to be clear about versions.\n",
    "\n",
    "In ArcticDB, every time a change is made to a symbol a new version is created. So each symbol has a sequence of versions through time.\n",
    "\n",
    "In a library there will typically be many symbols with each having many versions.\n",
    "\n",
    "Suppose we reach a point where we wish to record the current state of the data in the library. This is exactly the purpose of a snapshot.\n",
    "\n",
    "*A snapshot records the current versions of all the symbols in the library (or a custom set of versions, see below)*\n",
    "\n",
    "The data recorded in the snapshot can then be read back using the `as_of` parameter in the read.\n",
    "\n",
    "Versions that are part of a snapshot are protected from deletion, even if their symbol is deleted.\n",
    "\n",
    "Below is a simple example that demonstrates snapshots in action.\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Installs and Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install arcticdb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import logging\n",
    "import arcticdb as adb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up ArticDB\n",
    "<b>Note</b>: In this example we delete the library if it exists. That is not normal but we want to make sure we have a clean library in this case.\n",
    "\n",
    "Don't copy those lines unless you are sure that is what you need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "lib_name = 'demo'\n",
    "arctic = adb.Arctic(\"lmdb://arcticdb_snapshot_demo\")\n",
    "if lib_name in arctic.list_libraries():\n",
    "    arctic.delete_library(lib_name)\n",
    "lib = arctic.get_library('demo', create_if_missing=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create some symbols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['sym_0', 'sym_1', 'sym_2', 'sym_3']\n",
      "['sym_0', 'sym_1']\n"
     ]
    }
   ],
   "source": [
    "num_symbols = 4\n",
    "symbols = [f\"sym_{idx}\" for idx in range(num_symbols)]\n",
    "half_symbols = symbols[:num_symbols // 2]\n",
    "print(symbols)\n",
    "print(half_symbols)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write data for each symbol\n",
    "for idx, symbol in enumerate(symbols):\n",
    "    lib.write(symbol, pd.DataFrame({\"col\": [idx]}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write data only for the first half of the symbols\n",
    "for idx, symbol in enumerate(half_symbols):\n",
    "    lib.write(symbol, pd.DataFrame({\"col\": [idx+10]}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the snapshot\n",
    "\n",
    "The metadata is optional"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "lib.snapshot(\"snapshot_0\", metadata=\"this is the core of the demo\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Functions to discover and inspect snapshots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'snapshot_0': 'this is the core of the demo'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# list all snapshots\n",
    "lib.list_snapshots()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['sym_2', 'sym_1', 'sym_0', 'sym_3']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# list the symbols in a snapshot\n",
    "lib.list_symbols(snapshot_name=\"snapshot_0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']),\n",
       " sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']),\n",
       " sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']),\n",
       " sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0'])}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# list the versions in a snapshot\n",
    "lib.list_versions(snapshot=\"snapshot_0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']),\n",
       " sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']),\n",
       " sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']),\n",
       " sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00),\n",
       " sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0']),\n",
       " sym_0_v0: (date=2023-11-20 10:24:45.041944641+00:00)}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# list all versions in the library, with associated snapshots\n",
    "lib.list_versions()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading a snapshot version of a symbol"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)')\n",
      "   col\n",
      "0   10\n"
     ]
    }
   ],
   "source": [
    "vit = lib.read(\"sym_0\", as_of=\"snapshot_0\")\n",
    "print(vit)\n",
    "print(vit.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "VersionedItem(symbol='sym_3', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=0, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)')\n",
      "   col\n",
      "0    3\n"
     ]
    }
   ],
   "source": [
    "vit = lib.read(\"sym_3\", as_of=\"snapshot_0\")\n",
    "print(vit)\n",
    "print(vit.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Demonstration that snapshot versions are protected from deletion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# delete the symbol sym_0\n",
    "lib.delete(\"sym_0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['sym_2', 'sym_1', 'sym_3']"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# show that sym_0 has been deleted\n",
    "lib.list_symbols()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']),\n",
       " sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']),\n",
       " sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']),\n",
       " sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sym_0 does not appear in the current library versions\n",
    "lib.list_versions()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)')\n",
      "   col\n",
      "0   10\n"
     ]
    }
   ],
   "source": [
    "# however we can still read the version of sym_0 that was recorded in the snapshot\n",
    "vit = lib.read(\"sym_0\", as_of=\"snapshot_0\")\n",
    "print(vit)\n",
    "print(vit.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Although it works, we advise not to read snapshot versions directly using the version number\n",
    "These versions only exist because they are in a snapshot, so it is much more obvious to code to access them via the snapshot.\n",
    "\n",
    "Accessing snapshot protected versions via the version number leads to code that will fail (if the snapshot is deleted) in a way that is difficult to understand."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)')\n",
      "   col\n",
      "0   10\n"
     ]
    }
   ],
   "source": [
    "vit = lib.read(\"sym_0\", as_of=1)\n",
    "print(vit)\n",
    "print(vit.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR:root:Version not found\n"
     ]
    }
   ],
   "source": [
    "# version 0 was not in the snapshot, so it has been removed\n",
    "try:\n",
    "    vit = lib.read(\"sym_0\", as_of=0)\n",
    "    print(vit)\n",
    "    print(vit.data)\n",
    "except adb.exceptions.NoSuchVersionException:\n",
    "    logging.error(\"Version not found\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deleting a snapshot\n",
    "When we delete a snapshot, any versions that are only referenced by that snapshot will be deleted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "lib.delete_snapshot(\"snapshot_0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{}"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lib.list_snapshots()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR:root:Version not found\n"
     ]
    }
   ],
   "source": [
    "# version 1, which was kept as part of the snapshot, has now been deleted\n",
    "try:\n",
    "    vit = lib.read(\"sym_0\", as_of=1)\n",
    "    print(vit)\n",
    "    print(vit.data)\n",
    "except adb.exceptions.NoSuchVersionException:\n",
    "    logging.error(\"Version not found\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00),\n",
       " sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00),\n",
       " sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00),\n",
       " sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lib.list_versions()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Snapshot names must be unique\n",
    "Creating a snapshot with a name that already has a snapshot causes an error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "lib.snapshot(\"snapshot_1\", metadata=\"demo snapshot names need to be unique\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR:root:E_ASSERTION_FAILURE Snapshot with name snapshot_1 already exists\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    lib.snapshot(\"snapshot_1\")\n",
    "except Exception as e:\n",
    "    logging.error(e)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'snapshot_1': 'demo snapshot names need to be unique'}"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lib.list_snapshots()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modifiers for snapshot creation: exclude or include symbols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# exclude sym_1 from snapshot\n",
    "lib.snapshot(\"snapshot_2\", skip_symbols=[\"sym_1\"], metadata=\"demo skip_symbols\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_1', 'snapshot_2']),\n",
       " sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2']),\n",
       " sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_1']),\n",
       " sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lib.list_versions()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "# include specific versions of sym_1 and sym_2 from snapshot\n",
    "lib.snapshot(\"snapshot_3\", versions={\"sym_1\": 0, \"sym_2\": 0}, metadata=\"demo versions\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2', 'snapshot_3']),\n",
       " sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00, snapshots=['snapshot_3'])}"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lib.list_versions(snapshot=\"snapshot_3\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'snapshot_1': 'demo snapshot names need to be unique',\n",
       " 'snapshot_2': 'demo skip_symbols',\n",
       " 'snapshot_3': 'demo versions'}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lib.list_snapshots()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "\n",
    "### Snapshots: why and why not to use them\n",
    "\n",
    "#### Why\n",
    "\n",
    "* Snapshots record the current state of the library\n",
    "* They can be thought of as recoverable checkpoints in the evolution of the data\n",
    "* Snapshots can create an audit trail\n",
    "* Snapshots protect their data from deletion by other activity in the library\n",
    "\n",
    "#### Why Not\n",
    "\n",
    "* Generally we encourage the use of snapshots\n",
    "* However if many snapshots are created they can impose a slight performance penalty on some operations due to the deletion protection\n",
    "* Snapshots can also increase the storage used by ArcticDB, through protecting older versions that would otherwise be deleted\n",
    "* Use snapshots in a considered fashion and delete them when they are no longer needed\n",
    "\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Further Info / Extras\n",
    "\n",
    "For full descriptions of the functions used above, please see the ArcticDb documentation:\n",
    "\n",
    "* `snapshot()` https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.snapshot\n",
    "* `list_snapshots()` https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.list_snapshots\n",
    "* `list_versions()` https://docs.arcticdb.io/latest/api/library/#arcticdb.version_store.library.Library.list_versions"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
